Chemical Named Entity Recognition: Improving Recall Using a Comprehensive List of Lexical Features
نویسندگان
چکیده
As the number of published scienti c papers grows everyday, there is also an increasing necessity for automated named entity recognition (NER) systems capable of identifying relevant entities mentioned in a given text, such as chemical entities. Since high precision values are crucial to deliver useful results, we developed a NER method, Identifying Chemical Entities (ICE), which was tuned for precision. Thus, ICE achieved the second highest precision value in the BioCreative IV CHEMDNER task, but with signi cant low recall values. However, this paper shows how the use of simple lexical features was able to improve the recall of ICE while maintaining high levels of precision. Using a selection of the best features tested, ICE obtained a best recall of 27.2% for a precision of 92.4%.
منابع مشابه
A Novel Approach to Conditional Random Field-based Named Entity Recognition using Persian Specific Features
Named Entity Recognition is an information extraction technique that identifies name entities in a text. Three popular methods have been conventionally used namely: rule-based, machine-learning-based and hybrid of them to extract named entities from a text. Machine-learning-based methods have good performance in the Persian language if they are trained with good features. To get good performanc...
متن کاملImprovement of Chemical Named Entity Recognition through Sentence-based Random Under-sampling and Classifier Combination
Chemical Named Entity Recognition (NER) is the basic step for consequent information extraction tasks such as named entity resolution, drug-drug interaction discovery, extraction of the names of the molecules and their properties. Improvement in the performance of such systems may affects the quality of the subsequent tasks. Chemical text from which data for named entity recognition is extracte...
متن کاملبهبود شناسایی موجودیتهای نامدار فارسی با استفاده از کسره اضافه
Named entity recognition is a process in which the people’s names, name of places (cities, countries, seas, etc.) and organizations (public and private companies, international institutions, etc.), date, currency and percentages in a text are identified. Named entity recognition plays an important role in many NLP tasks such as semantic role labeling, question answering, summarization, machine ...
متن کاملMongolian Named Entity Recognition System with Rich Features
In this paper, we first build a manually annotated named entity corpus of Mongolian. Then, we propose three morphological processing methods and study comprehensive features, including syllable features, lexical features, context features, morphological features and semantic features in Mongolian named entity recognition. Moreover, we also evaluate the influence of word cluster features on the ...
متن کاملRecognition of chemical entities: combining dictionary-based and grammar-based approaches
BACKGROUND The past decade has seen an upsurge in the number of publications in chemistry. The ever-swelling volume of available documents makes it increasingly hard to extract relevant new information from such unstructured texts. The BioCreative CHEMDNER challenge invites the development of systems for the automatic recognition of chemicals in text (CEM task) and for ranking the recognized co...
متن کامل